Homework 4, BEE 6940 (Due By 5/4/23, 9:00PM)

Name: Parin Bhaduri

ID: pbb62

Overview

Instructions

Load Environment

The following code loads the environment and makes sure all needed packages are installed. This should be at the start of most Julia scripts.

Load Data

Next, let's load and process the data (as we did in Lab 09). This notebook uses hourly data from the San Francisco, CA tide gauge station, obtained from the University of Hawaii Sea Level Center (Caldwell et al (2015)).

Problems (100 points)

Problem 1: Stationary Model (40 points)

In this problem, you will fit a stationary GEV model and compute the Akaike and Deviance Information Criteria. You will also look at relevant graphical checks.

Problem 1.1: Fit the model (15 points)

Construct a Turing.jl stationary model with appropriate priors (there's no right choice; remember that thinking about the priors is part of the model checking process). Find the MLE estimate (you'll need this for the AIC) and use MCMC to sample from the posterior.

Problem 1.2: Compute information criteria (10 points)

Compute AIC and DIC for the stationary model.

Problem 1.3: Graphical checks (15 points)

Conduct relevant graphical checks you can think of. These could include return periods, credible intervals for the data, or any other statistics that seem appropriate. Explain what each check is for and what your conclusions are about the model based on them. What would you change about the model in 1.1, if anything, based on these checks?

Here, I compare the original distribution of extreme tide gauge values with the posterior distribution calculated from the average parameter values. Based on this graph, I see that the posterior follows the original distribution well, peaking around the same value. The dotted line distributions present the 95% predictive interval for the posterior distribution. We see that the original data is bounded by the interval, although the lower interval has an extended tail reaching out to upper extreme values not present in the original data. This could possibly be due to a few outliers in the original annual maximum values, which could skew the marginal likelihood towards higher extremes. I may try and remove outliers in the original dataset to see if my posterior interval is better constratined.

Problem 2: Nonstationary Model (40 points)

Next, we will model the tidal extremes using a non-stationary GEV distribution, where the location parameter (but not the shape or scale) is represented by a linear regression $$\mu_t = \mu_0 + \mu_1 x_t,$$ where $x_t$ is the annual mean Pacific Decadal Oscillation (PDO) index, which is based on the variability of sea-surface temperatures (SSTs) in the North Pacific (versus the El Niño–Southern Oscillation (ENSO), which emphasizes the equatorial SSTs).

First, let's load the PDO index dataset from NOAA.

Problem 2.1: Fit the model (15 points)

Construct a Turing.jl nonstationary model with appropriate priors (there's no right choice; remember that thinking about the priors is part of the model checking process). Find the MLE estimate (you'll need this for the AIC) and use MCMC to sample from the posterior.

Problem 2.2: Compute information criteria (10 points)

Compute AIC and DIC for the nonstationary model.

Problem 2.3: Graphical checks (15 points)

Conduct relevant graphical checks you can think of. These could include return periods, credible intervals for the data, or any other statistics that seem appropriate. Explain what each check is for and what your conclusions are about the model based on them. What would you change about the model in 1.1, if anything, based on these checks?

Once again, I compare the tide gauge data with a sample of values selected from the average parameter GEV distribution. The density distribution follows the general pattern of the original data, peaking near the same extreme value. The interval captures the data well to some degree, but it seems to underestimate values in the middle of the distribution, while overestimating in the tails. However, these values are only based on a single sample from these distributions, so the interval is probably not robust. I may continue to investigate the outliers of my original data in 1.1.

Problem 3 (20 points)

Based on the information criteria and your graphical checks, what do you think is the relative evidence for dependence of the San Francisco tide gauge extremes on the PDO? Can you draw a conclusion about which model is better? What else could you try?

The DIC for both models is around 1421, and such similarity doesn't indicate strong evidence for either model. Additionally, both posterior distributions follow the original data quite well. Overall, I don't see any evidence to suggest that the inclusion of the PDO is necessary when modeling the tide guage extremes. Both models seems to perform similarly, although the stationary GEV seemed to capture the original data better than the nonstationary version, but the nonstationary model had better constrained tails. From a calibration standpoint, I would probably opt for the stationary GEV model, but if I was more interested in modeling the major extremes of my data, the nonstationary model could be more helpful.